Speaker Dependent Bengali Keyword Spotting in Unconstrained English Speech Acknowledgement

نویسنده

  • Samarjit Das
چکیده

A project report submitted during summer internship under the supervision of Prof. Abstract Multi‐lingual interfaces can be of great use in a number of applications. A very important issue for such systems is to first identify the segments of utterances corresponding to a specific language. Language boundary information is also very vital before any further processing can be done. Language specific keyword spotting can be used for this purpose. Thus such a word spotter can serve as an integral part of a typical multi‐lingual system. A speaker dependent 'Bengali' keyword spotter in unconstrained 'English' speech had been developed in this project. Two approaches were used. Both used whole word based HMMs for keywords. All the Bengali keywords were trained as isolated words. The first approach used whole word filler model. The second approach used trained English phoneme models with an all phone grammar network to model the filler part. For whole word based approach an optimal performance of 94.22% hit with 1.17 FA/KW/H was obtained while the maximum %hit for the same system was 97.92% but at the cost of 7.03 FA/KW/H. The second approach attained an optimal performance with hit rate of 95.83% with just 0.71 FA/KW/H. However, maximum %hit for this system was same as first approach but with lesser false alarm rate of 4.45 FA/KW/H. Performance improvements in terms of reduction of false alarms have also been proposed. Finally, further development of the existing system to a 'speaker independent Bengali keyword spotter' has been discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of Recurrent Neural Networks to Discriminative Keyword Spotting

Keyword spotting is a detection task consisting in discovering the presence of specific spoken words in unconstrained speech. The majority of keyword spotting systems are based on generative hidden Markov models and lack discriminative capabilities. However, discriminative keyword spotting systems are based on the estimation of a posteriori probabilities at the frame-level, hence they make use ...

متن کامل

Language independent and unsupervised acoustic models for speech recognition and keyword spotting

Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models are then trained. This work con...

متن کامل

A Vocabulary-independent Keyword Spotter for Spontaneous Chinese Speech

HarkMan keyword-spotter was designed so that it can be used in a real-world environment to automatically spot the given words of a vocabulary-independent (VIND) task in unconstrained Chinese telephone speech. In this spotter, the speaking manner and the number of keywords are not limited. This paper focuses on a novel technique that addresses acoustic modeling, keyword-spotting network, search ...

متن کامل

Spotting Subsequences matching a HMM using the Average Observation Probability Criteria with application to Keyword Spotting

This paper addresses the problem of detecting keywords in unconstrained speech. The proposed algorithms search for the speech segment maximizing the average observation probability along the most likely path in the hypothesized keyword model. As known, this approach (sometimes referred to as sliding model method) requires a relaxation of the begin/endpoints of the Viterbi matching, as well as a...

متن کامل

Speaker-dependent Speech Recognition Based on Phone-like Units Models | Application to Voice Dialing

This paper presents a speaker dependent speech recognition with application to voice dialing. This work has been developed under the constraints imposed by voice dialing applications, i.e., low memory requirements and limited training material. Two methods for producing speaker dependent word baseforms based on Phone Like Units (PLU) are presented and compared : (1) a classical vector quantizer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005